This report is to recording the update of COSMIC data sets, and also some primary analysis results.
Version: R version 3.1.3 (2015-03-09)
Markdown file: Report_COSMIC_Data.Rmd
Last update of Markdown file: 2015-10-04
Format: [Project]-[Participant]-[Gender]-[Smoking]-[Diagnosis]
Example: COSMIC_1101_M_N_H
| Label | Identifier.for | Value | Value.Description | Possible.Values |
|---|---|---|---|---|
| Project | Project name | COSMIC | COSMIC project | COSMIC |
| Participant | Study participant | 1101 | The sample with ID 1101. | Any alpha-numeric value |
| Gender | Gender | M | Male | M: male; |
| F: female. | ||||
| Smoking | The smoking history of participant | N | non-smoker | N: non-smoker; |
| S: smoker; | ||||
| E: ex-smoker. | ||||
| Diagnosis | Diagnosis of COPD | H | Healthy | H: healthy; |
| C: COPD. |
Format: [Cell]-[Omic]-[Platform]
Example: BAL_P_DIGE
| Label | Identifier.for | Value | Value.Description | Possible.Values |
|---|---|---|---|---|
| Cell | Cell types | BAL | Data from BAL cells. | BAL; BALF; Exosomes; BEC; PLASMA |
| Omic | Omic study | P | Proteome | P: Proteome; |
| T: Transcriptome; | ||||
| L: Lipidome; | ||||
| M: Metabolome | ||||
| Platform | Technology used to generate omic data | DIGE | Protein expression by DIGE platform | HLA-typing; DIGE; iTRAQ; miR; mRNA; oxylipins; cys-LT; Global metab |
| item | fname | sheet | nrows | scol | ecol | log | normalization | note | X |
|---|---|---|---|---|---|---|---|---|---|
| datasummary | ./rawdata/Data block Incl Excl_summary_2014-11-29.xlsx | data | 119 | 12 | NA | NA | NA | Deleted sample COSMIC_ID 3208 3218 | |
| clinic | ./rawdata/COSMIC clinical_2016-03-31_selected.xlsx | data | 140 | NA | 124 | NA | NA | NA | |
| BAL_P_DIGE | ./rawdata/BAL_proteomics_DIGE_log10_ratio_2012.xlsx | data | 406 | 2 | 78 | log10 | ratio | NA | |
| BAL_P_iTRAQ_ONE | ./rawdata/BALcell_proteomics_iTRAQ_2016-05.xlsx | dataONE | 1809 | 2 | 70 | log2 | ratio | need missing value imputation | Proteins detected in at least 75% of the subjects in at least one group included (6 groups; Healthy, smoker and COPD stratified by gender) |
| BAL_P_iTRAQ_ALL | ./rawdata/BALcell_proteomics_iTRAQ_2016-05.xlsx | dataALL | 940 | 2 | 70 | log2 | ratio | need missing value imputation | Proteins detected in at least 75% of the subjects in all main group included (6 groups; Healthy, smoker and COPD stratified by gender) |
| BAL_T_miR | ./rawdata/BAL_miR_excl QC outl_quantile_log2_2011-10-23.xlsx | data | 896 | 43 | 128 | log2 | Quantile | NA | |
| BAL_T_mRNA | ./rawdata/BAL_T_mRNA_raw.csv | data | 41002 | 6 | 56 | log2 | Quantile | From ./rawdata/BAL_mRNA_All excl clinical_quantile_log2_2012-01-24.xlsx | |
| BEC_T_miR | ./rawdata/BEC_miR_Hsa_quantile_log2_66 samples_63 COSMIC_2014-12-11.xlsx | data | 896 | 43 | 108 | log2 | Quantile | NA | |
| EXO_T_miR | ./rawdata/EXO_miR_quantileAll_log2_64 subjects.xlsx | data | 1214 | 43 | 106 | log2 | Quantile | NA | |
| miRNA_annot_v3.6 | ./rawdata/miRNA-all-v3.6-8x15k-annotation.xlsx | NA | NA | NA | NA | NA | NA | From ./rawdata/miRAnnotation/miRNA-all-v3.6-8x15k-annotation.txt | |
| BALF_M_Oxylip | ./rawdata/Serum_BALF_oxylipins_final incl LODs_Balgoma.xlsx | Oxylip_BALF | 46 | 5 | 124 | NULL | NULL | ||
| Serum_M_Oxylip | ./rawdata/Serum_BALF_oxylipins_final incl LODs_Balgoma.xlsx | Oxylip_serum | 75 | 5 | 124 | NULL | NULL | ||
| Oxylip_annot | ./rawdata/Serum_BALF_oxylipins_final incl LODs_Balgoma.xlsx | KEGG IDS | 41 | 1 | 8 | NULL | NULL | ||
| Serum_M_Non_targeted | ./rawdata/Serum_metabolomics_3 platform_Shama_2016-05.xlsx | data | 1104 | 9 | 144 | NULL | NULL | ||
| Non_targeted_annot | ./rawdata/Serum_metabolomics_3 platform_Shama_2016-05.xlsx | data | 1104 | 1 | 8 | ||||
| Serum_M_Biocrates | ./rawdata/Serum_metabolomics_3 platform_Shama_2016-05.xlsx | Biocrates result | 79 | 3 | 186 | NULL | NULL | samples in rows, need to transpose | |
| Serum_M_Kynurenine | ./rawdata/Serum_metabolomics_3 platform_Shama_2016-05.xlsx | Kynurenine pathway | 118 | 3 | 6 | NULL | NULL | samples in rows, need to transpose | |
| Serum_M_Sphingolipid | ./rawdata/Serum_metabolomics_3 platform_Shama_2016-05.xlsx | Sphingolipid analysis | 116 | 3 | 31 | NULL | NULL | samples in rows, need to transpose | |
| BEC_P_TMT | ./rawdata/Proteomics_TMT_BECs_20160604_FINAL.xlsx | data | 1137 | 2 | 91 | log2 | ratio | ||
| clinic_bioconductor | ./rawdata/COSMIC clinical_2016-03-31_selected.csv | Null | NA | NA | NA |
All data are transformed to log2 based values.
| item | nrow/features | ncol/sample | unique_features | missing_values | data type |
|---|---|---|---|---|---|
| HLA_typing | 32 | 118 | 32 | 0 | integer |
| BAL_T_mRNA | 41000 | 51 | 41000 | 0 | numeric |
| BAL_P_DIGE | 404 | 77 | 108 | 0 | numeric |
| BAL_T_miR | 880 | 86 | 880 | 0 | numeric |
| BEC_T_miR | 880 | 63 | 880 | 0 | numeric |
| EXO_T_miR | 1212 | 64 | 1212 | 0 | numeric |
| BALF_M_Oxylip | 45 | 114 | 45 | 0 | numeric |
| Serum_M_Oxylip | 74 | 115 | 74 | 0 | numeric |
| Serum_M_Non_targeted | 1103 | 116 | 1103 | 0 | numeric |
| Serum_M_Biocrates | 182 | 76 | 182 | 0 | numeric |
| Serum_M_Kynurenine | 4 | 115 | 4 | 0 | numeric |
| Serum_M_Sphingolipid | 29 | 115 | 29 | 0 | numeric |
| BAL_P_iTRAQ_ONE_impute | 1266 | 69 | 1266 | 0 | numeric |
| BAL_P_iTRAQ_ALL_impute | 939 | 69 | 939 | 0 | numeric |
| BEC_P_TMT_impute | 1136 | 90 | 1136 | 0 | numeric |
The code is in preprocessing.R. After input COSMIC clinical_2016-03-31_selected.csv, a column variable named barcode is added.